A data-driven robust EVaR-PC with application to portfolio management

We investigate the robust chance constrained optimization problem (RCCOP), which is a combination of the distributionally robust optimization (DRO) and the chance constraint (CC). The RCCOP plays an important role in modeling uncertain parameters within a decision-making framework. The chance constraint, which is equivalent to a constraint of Value-at-risk (VaR), is approximated by risk measures such as Entropic Value-at-risk (EVaR) or Conditional Value-at-risk (CVaR) due to its difficulty to be evaluated. An excellent approximation requires both tractability and non-conservativeness. In addition, the DRO assumes that we know partial information about the distribution of uncertain parameters instead of their known true underlying probability distribution. In this article, we develop a novel approximation EVaR- PC based on EVaR for CC. Then, we evaluate the proposed approximation EVaR- PC using a discrepancy-based ambiguity set with the wasserstein distance. From a theoretical perspective, the EVaR- PC is less conservative than EVaR and the wasserstein distance possesses many good theoretical properties; from a practical perspective, the discrepancy-based ambiguity set can make full use of the data to estimate the nominal distribution and reduce the sensitivity of decisions to priori knowledges. To show the advantages of our method, we show its application in portfolio management in detail and give the relevant experimental results.


Introduction
In engineering and management, decision makers often need to make decisions based on uncertain parameters. Typically, for example, inventory managers need to decide on order quantities based on uncertain demands, and financial investors need to decide on investment weights based on uncertain asset returns. In general, uncertainty in these parameters may be due to limited observability of data, measurement errors, and prediction errors. In many mathematical optimization models, these uncertain parameters are assumed to be completely known nominal values according to prior knowledge or estimation from historical data. But once the real data deviates from our assumptions, it leads to poor decisions. The robust chance constrained optimization problem (RCCOP) plays an important role in modeling uncertain parameters within a decision-making framework. The chance constraints assume that the uncertain parameters follow some unknown probability distributions and evaluate the violation probability of constraints with uncertain parameters [1][2][3]. The chance constraint is defined as: X a ≔fx 2 X j Prfgðξ; xÞ � bg � 1 À ag; ðCCÞ . . . ; x p Þ > 2 X � R p are the decision variables. The threshold constant b in the constraint are deterministic parameters. The uncertain parameters ξ = (ξ 1 , ξ 2 , . . ., ξ p ) > in the constraint are random variables with unknown probability distribution. And Pr(A) is the probability of an event A occurs and g(�) is a random function. In (CC), the constraint g(ξ, x) � b is required to be satisfied with a probability at least 1 − α for some 0 < α < 1. Taking portfolio management as an example, the vector of uncertain parameters ξ = (ξ 1 , ξ 2 , . . ., ξ p ) > represents the loss of the p assets, the decision variable x = (x 1 , x 2 , . . ., x p ) > is the corresponding investment weight and b represents the maximum loss that investors can bear. This chance constraint requires that the loss of investment g(ξ, x) = ξ > x is less than b with a minimum probability of 95% if α = 0.05 and the violation probability Pr{g(ξ, x) � b} is required to be less than 5%. Note that (CC) is equivalent to the value-at-risk (VaR)-based constraint: X a ≔fx 2 X j VaR 1À a ðgðξ; xÞÞ � bg; where VaR 1−α (Y) = inf{s : Pr(Y � s) � 1 − α} for any random variable Y. Although VaR, as an excellent risk measure, is widely used in risk control in various fields, it is neither a convex constraint nor has a closed-form. These shortcomings lead to high computational complexity to evaluate the violation probability. Many studies attempts to address this difficulty [4][5][6]. An extremely important method is to find convex conservative approximations to the chance constraint. Two well-known convex approximations are conditional value-at-risk(CVaR [7]) and the entropic value-at-risk (EVaR [8]), respectively. To reduce the conservatism of the approximation, [9] proposed the DC approximation, which is the difference between two convex functions. In addition to these risk measures associated with the chance constraint approximation, there are a number of risk indicators [10][11][12][13], which are beyond the scope of our paper. The robust chance constrained optimization problem (RCCOP) evaluates the chance constraint through the paradigm of distributionally robust optimization (DRO), which assumes that the uncertain parameters belong to ambiguity sets rather than that the true probability distributions of the uncertain parameters are completely known [3]. From the perspective of DRO, decision makers need to make decisions based on the worst-case in the ambiguity set. Then, the probabilistic constraint in (CC) is turned to a robust chance constraint: where the P represents the ambiguity set containing all probability distributions with known partial information about the uncertain parameters obtained from historical data or domainspecific knowledge. The (RCC) is the feasibility set of (CC) because x 2 � X a implies x 2 X a , if the true probability distribution Pr True 2 P.
Research on stochastic optimization assumes that the ambiguity set P only contains the true probability distribution [14][15][16][17][18], whereas traditional robust optimization assumes that the ambiguity set P contains all probability distributions on the support of the uncertain parameters [19][20][21][22]. However, the optimal solution may be sensitive to the pre-specific probability distribution in stochastic optimization [23][24][25] and may be too conservative in traditional robust optimization due to ignored other distributional information. Thus, the ambiguity set P of DRO can effectively balance advantages and disadvantages of these two optimization methods.
The ambiguity set of DRO is typically divided into two types: moment-based and discrepancy-based ambiguity sets. The moment-based ambiguity sets contain probability distributions whose moments satisfy certain properties, while discrepancy-based ambiguity sets contain probability distributions that are close to a nominal distribution in the sense of some discrepancy measure [3]. There have been many researches on moment-based ambiguity set [26]. The ambiguity set using wasserstein distance as the discrepancy measure, called Wasserstein metric of order m, belongs to the discrepancy-based ambiguity set. [25,27,28] study the DRO based on ambiguity sets of Wasserstein metrics. [29] investigates a problem of distributionally robust individual chance constraint with ambiguity sets using 1-Wasserstein metric. [25] proposes exact methods to solve the stochastic programming with a data-driven chance constraint problem. For a given historical data set, confidence sets for two types of uncertain distributions are constructed by nonparametric statistical estimation of their density functions and moments. [30] studies a data-driven distributionally robust chance-constrained problem, where the ambiguity set of the distributions is formed by the m-Wasserstein metric using arbitrary norm.
Although the ambiguity set using Wasserstein distance is a powerful modeling paradigm for characterizing uncertainty, the challenging computation arising from the non-convex feasible region due to the (CC) remains a major focus of current research. A stream of approaches to address this challenge is to identify sufficient conditions to reformulate the original problem (RCC) into a convex optimization or mixed-integer programming (MIP), which are both computationally tractable. [31] proves the convexity of the feasible region when the reference distribution of the Wasserstein ambiguity set is Gaussian or log-concave. They derive a convex and conic representation and propose a block coordinate ascent algorithm, respectively, to effectively solve the chance constraint problems. [32] derives tractable mixed-integer linear or conic programming reformulations for several variants of distributionally (RCC) with Wasserstein ambiguity set under two types of uncertainties (i.e., uncertain probabilities vs. continuum of realizations). [33] also provide a mixed-integer conic program reformulation for chance constraint with right-hand side uncertainty. Meanwhile, as mentioned before, replacing chance constraint with its convex conservative approximations is also an important approach to transform the (CC) into convex optimization. The significance of our proposed EVaR-PC lies in improving the conservatism of the widely used convex conservative approximation EVaR and demonstrating its computational tractability in portfolio management.
The contributions of this paper contain the following.

A tighter upper bound
EVaR-PC based on EVaR for violation probability is developed, which takes existing approximation approaches as a special case.
2. We show how to evaluate the proposed approximation EVaR-PC based on the m-Wasserstein metric using arbitrary norm.
3. This article explains how to apply our proposed model of RCCOP to portfolio management, and illustrates the advantages of our method with experimental results.

Outline
This paper is organized as follows. In §Methods, we propose the EVaR-PC approximation and introduce the m-Wasserstein metric-based ambiguity sets for DRO. An application in portfolio management is shown in §Portfolio management. In §Conclusions, we give a summary of this article and potential research content in the future.

Methods
In this section, we propose a general family of upper-bounds which enjoy nice properties of the DC approximation but are more flexible to introduce more variants, and EVaR-PC approximation is a member of this family. In order to evaluate these conservative approximations in the context of DRO, we then introduce the definition of the discrepancy-based ambiguity sets using m-Wasserstein metric and related reformulation for final the optimization problem.

A tighter upper bound for violation probability
As described in the previous section, the (CC) is equivalent to the VaR-based constraint. The CVaR and EVaR are two conservative approximations to VaR. Specifically, the approximations for VaR-based constraint are Þg for any random variable Y.
As discussed in §Introduction, many existing literature have proposed good upper-bounds (or conservative approximations) for the violation probability under the (ARCC) setting, such as CVaR approximation (1) by [8], and EVaR approximation (2) by [8], among others. A distinct approach proposed by [9] uses a difference between two convex functions to define the DC upper-bound: Although the resulting approximation is not convex, it is much tighter than other convex approximations. The original motivation of [9] is first to "shift F CVaR (g(ξ, x) − b, t) to the right side by a distance of t". We observe that the resulting function is equivalent to "the positive part of shifting F CVaR (g(ξ, x) − b, t) down by a distance of 1". That is 1 t ðgðξ; xÞ À bÞ þ ¼ ½F CVaR ðgðξ; xÞ À b; tÞ À 1� þ . Inspired by such an approach, we propose a

PLOS ONE
general family of upper-bounds which enjoy nice properties of the DC approximation but are more flexible to introduce more variants.
Without losing generality, we study the mathematical properties of these upper-bounds by assuming b = 0 and linear relationship g(ξ, x) = ξ > x, which is very suitable for portfolio management. Let F 0 (ξ > x, t) be any upper-bound of 1 ð0;1Þ ðξ > xÞ where t > 0 such that F 0 ðξ > x; tÞ � 1 ð0;1Þ ðξ > xÞ for any ξ > x 2 R. We then define the following family of approximations for the indicator function by taking the difference between the original upper-bound and its vertical shift as Since the upper-bounds in our proposed family always take value 1 for ξ > x � 0, we call (4) a positive-capped (PC) approximation. For any t > 0, Hence, we can let F PC ðξ > x; tÞ ¼ inf t>0 fC½F 0 ðξ > x; tÞ�g, which is the best approximation among all.
which is always nonnegative. Therefore, C½F 0 ðξ > x; tÞ� is nondecreasing in t for any t > 0. It can be seen that the DC approximation (3) belongs to this general family by setting Similarly, we can use the EVaR approximation (2) in (4) to obtain the EVaR based PC approximation as Although not being convex anymore, the PC family generally provides a much tighter upperbound than many conventional convex approximations, such as CVaR (1) and EVaR (2). We also want to make a remark that while both being members of the PC family, the EVaR-PC approximation contains only one non-differentiable point at ξ > x = 0 whereas the DC approximation, equivalent to CVaR-PC approximation, has two non-differentiable points at ξ > x = 0 and ξ > x = −t. Theorem 1. Let E½Fðgðξ; xÞ À bÞ� in (ARCC) be inf t>0 fEðF EVaRÀ PC ðgðξ; xÞ À b; tÞÞg. Then (ARCC) converges to (RCC). Left: This plot compares the convex approximation CVaR for the indicator function to its corresponding PC approximation that is equivalent to the DC approximation proposed by [9]. Right: This plot compares the convex approximation EVaR for the indicator function to its corresponding PC approximation.  When g(ξ, x)  which completes the proof.
Theorem 1 provides a connection between the proposed novel upper-bounds in the PC family and the original chance constraint. This conclusion is also applicable to the special case of b = 0 and linear relationship g(ξ, x) = ξ > x.

The m-Wasserstein metric
The Wasserstein distance is a function that can measure the distance between probability distributions on a given metric space and is also called the "earth mover's distance". If each distribution is viewed as a unit amount of earth piled on the given metric space and we need to turn one pile into the other, Wasserstein distance is assumed to be the minimum amount of earth that needs to be moved times the mean distance it has to be moved.
Definition 1. The f Y is the probability density function (p.d.f.) for continuous random variable Y 2 X. Then, for m � 1, the m- where Gðf Y 1 ; f Y 2 Þ denotes the collection of all measures on X × X with marginals f Y 1 and f Y 2 on the first and second factors respectively. Definition 2. The p Y (y) = Pr(Y = y) is the probability mass function (p.m.f.) for discrete random variable Y on finite domain X. For m � 1, the m-Wasserstein distance in (5) becomes Next, we present methods for solving optimization problems with discrepancy-based ambiguity sets using m-Wasserstein metric. The decision making problem can be formulated as follows: where θ > 0 is a chosen radius and ξ 2 X � R p . And the f 0 is a nominal distribution, such as

PLOS ONE
an empirical distribution, that estimated by historical data. To solve the problem (6), [34] proves the strong duality result using a novel, elementary, constructive approach. For the completeness of our article, we show one of their dual results for finite domain, which is also closely related to our research. Example 1. When X is finite, that is X = {ξ 1 , ξ 2 , . . ., ξ N }, the nominal distribution p 0 is given by p 0 (ξ j ) for j = 1, 2, . . ., N. Then, the problem (6) becomes: where γ j,k = γ(ξ j , ξ k ).

Portfolio management
In this section, we first introduce the experimental data and the portfolio investment management strategy, and then demonstrate and discuss the results of the data experiment.

Experimental data and algorithm
Our portfolio management strategy is based on financial assets, and the experimental data are from the financial client Wind. We select 5 globally important market indexes as investment objects. Their codes in the financial client Wind are as follows: IBOVESPA.GI, IXIC.GI, FTSE. GI, AS51.GI and 000001.SH. The experimental data are the weekly returns of these market indexes from October 2017 to October 2022, and our strategy requires weekly adjustment of investment weights based on changes in these weekly data. In our experiment, ξ represents loss or negative return, the decision variable x represents investment weight and then g(ξ, x) = ξ > x. We use historical data from the past year to estimate ξ, so the elements in the set X = {ξ 1 , ξ 2 , . . ., ξ N } are the losses over the past N = 52 weeks. The empirical distribution p 0 ðξ j Þ ¼ 1 N for j = 1, 2, . . ., N is used as the nominal distribution for m-Wasserstein metric. The problem (7) is our primary original experimental model. In addition, a supplementary constraint 1 > x = 1 for x � 0 is added to indicate that short selling is not allowed. For the hyperparameters, let m = 1 to simplify the calculation, b = 0 and θ selected from set {0.1, 0.5, θ N (β)}, where θ N (β) is the minimum radius of Wasserstein ball that contains the true distribution with confidence 1 − β for some prescribed β 2 (0, 1) based on a finite sample (i.e., N = 52) guarantee. According to the results from [35], we have for all N � 1, M 6 ¼ 2 and θ > 0. The c 1 and c 2 are positive constants, which are set to 0.1 and 2 respectively. Let M < 2 and β = 0.05. And then, it holds that the minimum radius θ N (β) � 0.0816. The main experimental algorithm steps are as follows: (7). Solve the strong dual problem (8) to get its optimal solution x*.
3. Based on the worst-case distribution p*(ξ j ) and F EVaRÀ PC ðξ > j x * ; tÞ, we can have 5. Calculate out-of-sample return performance for the weekly strategy by the investment weightsx.
It is worth noting that the optimal t for inf t>0 fEðF EVaR ðξ > x; tÞÞg can be obtained by searching method [36]. To better reflect the experimental performance of our proposed method, we use two benchmarks as comparison. We show in detail how these two benchmarks are calculated: 1. MV: min x x > ∑x and P p i x i ¼ 1 for x i � 0, i = 1, . . ., p. The ∑ represents the covariance matrix of the returns on p assets.
2. Mean: Let x i ¼ 1 p for i = 1, . . ., p. This method can reflect the average return of these investment objects.

Experimental results and discussion
As mentioned above, an excellent approximation requires both tractability and non-conservativeness. Although, CVaR is the "tightest" approximation, it has no explicit closed-form. EVaR is very solvable due to its convexity, but it's a very conservative approximation. Therefore, we propose EVaR-PC approximation based on EVaR, which can effectively reduce the conservatism of EVaR. The EVaR-PC approximation is a tighter upper bound for violation probability, which means it is a risk measure. In our model setting, the value of EVaR-PC represents the upper bound on the probability that an investment strategy may lose money. For example, if inf t>0 fE½F EVaRÀ PC ðξ > x; tÞ�g ¼ 0:05 for b = 0, it means that the current investment strategy may suffer losses with a probability of less than 5%.
The main purpose of our experiment is to examine two questions: (1) Will the EVaR-PCbased strategy lead to an improvement in the performance of the EVaR-based strategy? (2) What are the effects of different risk thresholds on the experimental results? To answer these two questions, we present the net asset value curves of different strategies and calculate the corresponding measures that are widely used in the financial field. The whole experiment tested the performance of weekly investment 204 times, so the net asset value is defined as Q T t¼1 ð1 À ξ > t xÞ for T = 1, . . ., 204, where −ξ t is the return of the asset at week t. The Maximum Drawdown is one of the important financial indicators of the maximum loss that investors may suffer over the investment horizon. The results in Fig 2 and Table 1 can answer the first question: The EVaR-PC-based strategy lead to a significant improvement in the performance of the EVaR-based strategy.
It can be seen that radius_0.1_PC's and radius_0.5_PC's net asset value curves are always above radius_0.1's and radius_0.5's respectively throughout the experimental period in Fig 2. Higher net asset value means better performance in terms of return on assets. In addition, in the Table 1, radius_0.1 and radius_0.5 have annualized returns of 9.15% and 5.97%, while radius_0.1_PC and radius_0.5_PC can achieve annualized returns of 14.17% and 6.48%, respectively. The EVaR-PC-based strategies' annualized returns, especially of the radius_0.1_PC, are also significantly higher than that of the two benchmarks Mean and MV, which indicates that our proposed method has important significance in terms of returns. At the same time, from the point of view of risk control, our proposed method also has enough advantages. Although the Std of our proposed strategies are respectively equal to their corresponding EVaR-based strategy and slightly higher than MV's, our Maximum Drawdown 23.31% and 21.54% are the lowest among all methods. As we all know, due to some exogenous factors, the global financial market indexes almost all experienced significant declines in the first quarter of 2020. In the face of this systemic risk, radius_0.1 and radius_0.5, like other benchmarks, inevitably saw their returns fall sharply. However, we can see from Fig 2 that the decline of the net asset value of the radius_0.1_PC and radius_0.5_PC are significantly lower than that of others, which shows that our methods have effective risk control ability. In addition to comparing returns and risks separately, the Sharpe ratio is a financial indicator that balances both returns and risks. And the bigger the Sharpe ratio, the better. When θ = 0.1 and 0.5, our strategies can increase the Sharpe ratio from 0.52 and 0.36 to 0.79 and 0.40, respectively.
For the second question, we illustrate it in terms of hyperparameters θ and α. The θ is the radius of ambiguity sets using m-Wasserstein metric and controls the size of the ambiguity sets. In theory, the larger the ambiguity sets, the more conservative the optimal solution is ; but the smaller it is, the lower the probability that the ambiguity sets contain the true distribution. Here, we compare the performance of the portfolio under different Wasserstein radius (i.e., θ = 0.1, 0.5, θ N (β)). The strategies based on θ = θ N (β) are called radius_min and radius_min_PC. It can be seen from the Table 1 that, consistent with the conclusions of θ = 0.1 and 0.5, the radius_min_PC is superior to the strategy radius_min. Comparing the two strategies, not only did the annualized return increase from 9.21% to 9.23%, but the Maximum Drawdown decreased by about 8.61% from 29.58% to 20.97%. In terms of these EVaR-PC-based strategies, Fig 3 shows that the net asset value curves of radius_0.1_PC and radius_min_PC are higher than that of radius_0.5_PC, which means that the first two strategies are less conservative than the latter. Meanwhile, we can observe from the Table 1 that all strategies with Wasserstein ambiguity set have significantly higher Sharpe ratios than the two benchmarks including Mean(0.31) and MV(0.09). This also suggests that the Wasserstein radius we have chosen are large enough for the Wasserstein ball to contain the true distribution and result in good performance.
To clearly observe the influence of α on the experimental results, we present the net asset value curves of strategies with different α in Fig 4 and show their corresponding measures in Table 2. Fig 2. Comparisons of different net asset value curves for different approximations. Left: θ = 0.1. The method radius_0.1 is a EVaR-based strategy calculated by weights x*, while radius_0.1_PC is calculated by its corresponding weightsx. Right: θ = 0.5. The method radius_0.5 is a EVaR-based strategy calculated by weights x*, while radius_0.5_PC is calculated by its corresponding weightsx.

PLOS ONE
In this part, we take radius_0.1 as the benchmark. Theoretically speaking, the smaller α is, the stricter the risk control is. In the Fig 4, we can find that when α is larger, the net asset value curve is higher and its fluctuation is larger. Specifically, the annualized return 9.15% of radius_0.1 is less than that of alpha_0.2's 9.8% and more than that of alpha_0.05's 4.37% and alpha_0.1's 8.63%. In terms of risk, the Std and Maximum Drawdown of alpha_0.05 and alpha_0.1 are dramatically lower than those of alpha_0.2. In particular, the Std and Maximum Drawdown of alpha_0.05 are only 0.05 and 3.95%, respectively. In general, these experimental conclusions are basically consistent with the theoretical properties of α. In addition, the net asset value curves of alpha_0.05 and alpha_0.1 didn't go down at all during the systemic risk in the first quarter of 2020, while the net asset value curve of alpha_0.2 shows obvious fluctuations. And for Sharpe ratio, the alpha_0.2 is also significantly lower among the three methods with different α. Therefore, the alpha_0.2 may be too risky for a risk-averse investor.
In general, our EVaR-PC approximation can more accurately describe the violation probability and increase the risk control capability. Compared with the original EVaR-based

PLOS ONE
investment weights, the main difference of our algorithm strategy is that: When the violation probability is less than 5%, we increase the investment weights; when the violation probability is greater than 20%, we give up investment to avoid risks. For investors with different levels of risk aversion, different hyperparameters related to decision making may be selected, such as risk thresholds.

Conclusions
We investigate the robust chance constrained optimization problem, which plays an important role in modeling uncertain parameters within a decision-making framework. The chance constraint can tell us the violation probability of the event, so as to help us make the optimal decision and avoid the risk. However, since the chance constraint is difficult to evaluate, many conservative approximations have been proposed, such as EVaR and CVaR. To more accurately describe the risk, we propose the PC approximation, which can enjoy nice properties of the DC approximation but are more flexible to introduce more variants. In this paper, we  evaluate the EVaR-PC approximation through the discrepancy-based ambiguity sets using m-Wasserstein metric. The discrepancy-based ambiguity set can make full use of the data to estimate the nominal distribution and reduce the sensitivity of decisions to priori knowledges. And we show how to apply our RCCOP model in portfolio management. The numerical experiment results show that our EVaR-PC approximation can describe the violation probability more accurately and help us improve the risk control ability. Investors with different risk preferences can choose different risk thresholds to adjust the style of their strategies. Theoretically, our proposed EVaR-PC effectively reduces the conservatism of EVaR. In addition, we use m-Wasserstein metric-based ambiguity sets that help improve the robustness of decisions. In practice, the experimental algorithms that we show can provide necessary help for portfolio management. We can continue our research in two aspects in the future: (1) Integrating with data mining and machine learning to full use of the value of data. (2) Thinking about how to make the ambiguity sets more suitable for the case of small samples.